-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up ast::Attribute
, ast::CrateConfig
, and string interning
#37824
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo comments. cc @rust-lang/compiler
@@ -185,31 +185,31 @@ pub enum DefPathData { | |||
/// An impl | |||
Impl, | |||
/// Something in the type NS | |||
TypeNs(InternedString), | |||
TypeNs(Symbol), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these are InternedString
so they hash their contents instead of the integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I got overzealous with the InternedString
-> Symbol
refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make a type SymbolStringContent
?
@@ -717,16 +717,16 @@ impl<'a, 'tcx, 'gcx> PolyExistentialProjection<'tcx> { | |||
self.0.item_name // safe to skip the binder to access a name | |||
} | |||
|
|||
pub fn sort_key(&self, tcx: TyCtxt<'a, 'gcx, 'tcx>) -> (u64, InternedString) { | |||
pub fn sort_key(&self, tcx: TyCtxt<'a, 'gcx, 'tcx>) -> (u64, Symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, the intention is to sort the strings themselves.
@@ -0,0 +1,284 @@ | |||
// Copyright 2012 The Rust Project Developers. See the COPYRIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Year is 2016.
520c3c9
to
fc519bb
Compare
|
||
impl Decodable for InternedString { | ||
fn decode<D: Decoder>(d: &mut D) -> Result<InternedString, D::Error> { | ||
Ok(Symbol::intern(&d.read_str()?).as_str()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a suspicious line.
InternedString
is supposed to be simply a wrapper around Rc<str>
now not tied to symbol interner, right? InternedString::new
for example just creates a new Rc
and intern_and_get_ident
was removed by this PR.
At the same time decode
not only creates an Rc
but also pushes it into the interner.
One of them is probably wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InternedString::new
should probably go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what the remaining uses of InternedString
are.
Is it used only for hashing/encoding/etc Symbol
s as strings? In this case something like SymbolStringContent
is probably a better solution and InternedString
should go away completely, including new
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, InternedString
should be semantically equivalent to SymbolStringContent
-- we can rename later. I don't think InternedString
is that bad of a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed InternedString::new
.
The one other thing I'd like to see is a |
☔ The latest upstream changes (presumably #37732) made this pull request unmergeable. Please resolve the merge conflicts. |
be6cd9d
to
fbb1713
Compare
@eddyb this should be ready to land now -- the second commit and the last three commits new since last review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with comments addressed.
@@ -574,7 +574,7 @@ pub struct GlobalCtxt<'tcx> { | |||
|
|||
/// Map from function to the `#[derive]` mode that it's defining. Only used | |||
/// by `proc-macro` crates. | |||
pub derive_macros: RefCell<NodeMap<token::InternedString>>, | |||
pub derive_macros: RefCell<NodeMap<InternedString>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this and crate_name
above be Symbol
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done.
@@ -2344,7 +2344,7 @@ impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> { | |||
if let Some(id) = self.map.as_local_node_id(id) { | |||
self.map.name(id) | |||
} else if id.index == CRATE_DEF_INDEX { | |||
token::intern(&self.sess.cstore.original_crate_name(id.krate)) | |||
Symbol::intern(&self.sess.cstore.original_crate_name(id.krate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like original_crate_name
should return a Symbol
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -640,23 +612,20 @@ impl RustcDefaultCalls { | |||
let allow_unstable_cfg = UnstableFeatures::from_environment() | |||
.is_nightly_build(); | |||
|
|||
for cfg in &sess.parse_sess.config { | |||
if !allow_unstable_cfg && GatedCfg::gate(cfg).is_some() { | |||
for &(name, ref value) in sess.parse_sess.config.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, doesn't this expose HashSet
ordering? Even if you were to use a BTreeSet
it would expose intern order. Maybe collect strings instead of printing them, to a Vec
and just sort that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -895,12 +895,12 @@ impl<'a> CrateLoader<'a> { | |||
n.check_name("name") | |||
}).and_then(|a| a.value_str()); | |||
let n = match n { | |||
Some(n) => n, | |||
Some(n) => n.as_str().to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be written as .to_string()
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but no longer relevant after using Symbol
instead of String
in NativeLibrary
.
@@ -321,7 +321,7 @@ impl<'bcx, 'tcx> MirContext<'bcx, 'tcx> { | |||
|
|||
// Get the location information. | |||
let loc = bcx.sess().codemap().lookup_char_pos(span.lo); | |||
let filename = token::intern_and_get_ident(&loc.file.name); | |||
let filename = Symbol::intern(&loc.file.name).as_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to change loc.file.name
to Symbol
- in fact, any String
in the compiler that is not on some transient and/or error path, may benefit from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but best for a future PR.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct Symbol(pub u32); | ||
|
||
impl !Send for Symbol { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment about its thread locality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
/// A symbol is an interned or gensymed string. | ||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct Symbol(pub u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pub
field is dangerous, you'll have to privatize it before attempting any optimizations (if the fallout is minimal, maybe do it here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// If an interner exists in TLS, return it. Otherwise, prepare a fresh one. | ||
// FIXME(eddyb) #8726 This should probably use a thread-local reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is no longer relevant (we switched from returning Rc
to taking a closure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
/// Reset the ident interner to its initial state. | ||
pub fn reset_interner() { | ||
with_interner(|interner| *interner = Interner::fresh()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsound: you keep references to the string contents around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made reset_interner
unsafe
. It isn't used in rustc
itself, but rusti
wants to be able to reset thread local state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a real point to that? The interner would just have a "warm start".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like rusti
isn't using driver::reset_thread_local_state
at all anymore, so probably not (removed).
/// somehow. | ||
#[derive(Clone, PartialEq, Hash, PartialOrd, Eq, Ord)] | ||
pub struct InternedString { | ||
string: &'static str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventual optimization opportunity: store the CodeMap
in the interner, guarantee this "symbol string contents" type always points to something owned by the interner, use it in place of &str
in the lexer, with appropriate slicing & whatnot, and allow interning it without allocating again - although this requires the arena-based interner too.
cc @rust-lang/compiler |
6698d2c
to
5400c4f
Compare
@bors r+ |
📌 Commit 5400c4f has been approved by |
⌛ Testing commit 5400c4f with merge 531bd42... |
@bors r=eddyb |
📌 Commit 60b5372 has been approved by |
⌛ Testing commit 60b5372 with merge a56532f... |
💔 Test failed - auto-win-msvc-64-opt-rustbuild |
@bors r=eddyb |
📌 Commit a8e86f0 has been approved by |
Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning This PR - removes `ast::Attribute_` (changing `Attribute` from `Spanned<Attribute_>` to a struct), - moves a `MetaItem`'s name from the `MetaItemKind` variants to a field of `MetaItem`, - avoids needlessly wrapping `ast::MetaItem` with `P`, - moves string interning into `syntax::symbol` (`ast::Name` is a reexport of `symbol::Symbol` for now), - replaces `InternedString` with `Symbol` in the AST, HIR, and various other places, and - refactors `ast::CrateConfig` from a `Vec` to a `HashSet`. r? @eddyb
“Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning” Resolves these errors: error[E0425]: unresolved name `token::intern` --> src/lib.rs:27:35 | 27 | reg.register_syntax_extension(token::intern("quickcheck"), | ^^^^^^^^^^^^^ unresolved name error[E0425]: unresolved name `token::str_to_ident` --> src/lib.rs:86:23 | 86 | let check_ident = token::str_to_ident("quickcheck"); | ^^^^^^^^^^^^^^^^^^^ unresolved name error[E0425]: unresolved name `token::intern_and_get_ident` --> src/lib.rs:107:34 | 107 | span, cx.meta_word(span, token::intern_and_get_ident("test")))); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ unresolved name error: aborting due to 3 previous errors Signed-off-by: Anders Kaseorg <andersk@mit.edu>
“Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning” Resolves these errors: error[E0425]: unresolved name `token::intern` --> src/lib.rs:27:35 | 27 | reg.register_syntax_extension(token::intern("quickcheck"), | ^^^^^^^^^^^^^ unresolved name error[E0425]: unresolved name `token::str_to_ident` --> src/lib.rs:86:23 | 86 | let check_ident = token::str_to_ident("quickcheck"); | ^^^^^^^^^^^^^^^^^^^ unresolved name error[E0425]: unresolved name `token::intern_and_get_ident` --> src/lib.rs:107:34 | 107 | span, cx.meta_word(span, token::intern_and_get_ident("test")))); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ unresolved name error: aborting due to 3 previous errors Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This PR
ast::Attribute_
(changingAttribute
fromSpanned<Attribute_>
to a struct),MetaItem
's name from theMetaItemKind
variants to a field ofMetaItem
,ast::MetaItem
withP
,syntax::symbol
(ast::Name
is a reexport ofsymbol::Symbol
for now),InternedString
withSymbol
in the AST, HIR, and various other places, andast::CrateConfig
from aVec
to aHashSet
.r? @eddyb